-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Ray Cluster/AppWrapper creation #650
Refactor Ray Cluster/AppWrapper creation #650
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
e5e8b9b
to
f25fddc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magnum opus for sure @Bobbins228 ! Will continue reviewing but some nitpicks and a question so far.
cfc037f
to
dab3020
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give this another pass tomorrow. Good work!
|
||
@imagePullPolicy.setter | ||
def imagePullPolicy(self, imagePullPolicy): | ||
self._imagePullPolicy = imagePullPolicy | ||
|
||
@property | ||
def securityContext(self): | ||
return self._securityContext | ||
|
||
@securityContext.setter | ||
def securityContext(self, securityContext): | ||
self._securityContext = securityContext | ||
|
||
@property | ||
def idleTimeoutSeconds(self): | ||
return self._idleTimeoutSeconds | ||
|
||
@idleTimeoutSeconds.setter | ||
def idleTimeoutSeconds(self, idleTimeoutSeconds): | ||
self._idleTimeoutSeconds = idleTimeoutSeconds | ||
|
||
@property | ||
def upscalingMode(self): | ||
return self._upscalingMode | ||
|
||
@upscalingMode.setter | ||
def upscalingMode(self, upscalingMode): | ||
self._upscalingMode = upscalingMode | ||
|
||
@property | ||
def env(self): | ||
return self._env | ||
|
||
@env.setter | ||
def env(self, env): | ||
self._env = env | ||
|
||
@property | ||
def envFrom(self): | ||
return self._envFrom | ||
|
||
@envFrom.setter | ||
def envFrom(self, envFrom): | ||
self._envFrom = envFrom | ||
|
||
@property | ||
def volumeMounts(self): | ||
return self._volumeMounts | ||
|
||
@volumeMounts.setter | ||
def volumeMounts(self, volumeMounts): | ||
self._volumeMounts = volumeMounts | ||
|
||
def to_dict(self): | ||
"""Returns the model properties as a dict""" | ||
result = {} | ||
|
||
for attr, _ in six.iteritems(self.openapi_types): | ||
value = getattr(self, attr) | ||
if isinstance(value, list): | ||
result[attr] = list( | ||
map(lambda x: x.to_dict() if hasattr(x, "to_dict") else x, value) | ||
) | ||
elif hasattr(value, "to_dict"): | ||
result[attr] = value.to_dict() | ||
elif isinstance(value, dict): | ||
result[attr] = dict( | ||
map( | ||
lambda item: (item[0], item[1].to_dict()) | ||
if hasattr(item[1], "to_dict") | ||
else item, | ||
value.items(), | ||
) | ||
) | ||
else: | ||
result[attr] = value | ||
|
||
return result | ||
|
||
def to_str(self): | ||
"""Returns the string representation of the model""" | ||
return pprint.pformat(self.to_dict()) | ||
|
||
def __repr__(self): | ||
"""For `print` and `pprint`""" | ||
return self.to_str() | ||
|
||
def __eq__(self, other): | ||
"""Returns true if both objects are equal""" | ||
if not isinstance(other, V1AutoScalerOptions): | ||
return False | ||
|
||
return self.to_dict() == other.to_dict() | ||
|
||
def __ne__(self, other): | ||
"""Returns true if both objects are not equal""" | ||
if not isinstance(other, V1AutoScalerOptions): | ||
return True | ||
|
||
return self.to_dict() != other.to_dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these auto-generated or have you created them by hand?
Can you give more info about how they were created?
My main concern is how will we keep these up to date as the Ray API Spec evolves. I think this is definitely the correct approach I just want to raise this concern now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to auto generate these but from what I can see in the KubeRay repo there is no valid Open API generator file -> they should look like this for ref: pet-store.yaml
So I based the models files on the way they should be auto generated from what I can see in the Python K8s API and the specs from raycluster_types.go.
I am not a fan of how I generated these files but given the alternative is to re-create the base template within build_ray_cluster.py
I am not sure how we should proceed.
src/codeflare_sdk/cluster/cluster.py
Outdated
write_to_file=write_to_file, | ||
appwrapper=is_appwrapper, | ||
) | ||
cluster = Cluster(cluster_config, is_retrieved_cluster=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we turn this into something like:
cluster = Cluster.cluster_from_k8
I'm not a huge fan of is_retrieved_cluster
flag leaking out into public apis. It feels very awkward to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate a bit more on this?
That bool exists to prevent a Ray Cluster from being built in build_ray_cluster.py
. Other wise when we create the limited ClusterConfiguration
the Cluster object is created but you would get duplicated print statements for "Written to: {output_file_name}
from first creating the limited cluster then from creating a file from the retrieved cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can create a class function which has different logic which also returns a cluster. Something like:
cluster = Cluster.new_cluster_skip_create(cluster_config)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are a hero, that is a great idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KPostOffice WDYT of the new changes for get_cluster
I have added?
When trying to initialise the Cluster object I was getting exception errors for not providing a ClusterConfiguration
. I tried to think of a way around this and I opted to set the CC as none when using get_cluster
which would allow me to set the config
and resource_yaml
after initialisation.
I in turn added a warning if a user tried to specify ClusterConfiguration
as None
. I feel like this is still a bit crude and would welcome any further suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to spend some time pair programming this tomorrow? We can spitball some ideas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sounds good I can set up a meeting for later
dab3020
to
ee5ddc7
Compare
Signed-off-by: Bobbins228 <[email protected]>
Signed-off-by: Bobbins228 <[email protected]>
…ction doc Signed-off-by: Bobbins228 <[email protected]>
ee5ddc7
to
2ef248c
Compare
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Closing in favour of #751 |
Issue link
Closes: RHOAIENG-10385 and RHOAIENG-8846
What changes have been made
ray_version
a variable for potential future automationimage
config variable to default toquay.io/rhoai/ray:2.23.0-py39-cu121
create_resource
get_cluster
method to generate a new ClusterConfiguration with just thename
andnamespace
of the cluster and retrieved yaml. Addedis_appwrapper
bool so that users can get AppWrappers/Ray Clusters_retrieved_cluster
boolean forget_cluster
command to avoid generating a "false" Ray Cluster viaClusterConfiguration
Cluster Configuration
documentation and added new doc for methods used when interacting with Ray Clusters/AppWrappers.Verification steps
Setup
Notebook server ODH/RHOAI/Local
git clone https://github.com/project-codeflare/codeflare-sdk.git
poetry build
- install if needed (pip install poetry
)pip install --force-reinstall dist/codeflare_sdk-0.0.0.dev0-py3-none-any.whl
Testing
All
ClusterConfiguration
parameters must be tested with the new cluster creation method.Keep a special eye out for the following as they were the most complex to implement:
Automated Notebook testing should cover the functionality changed but I still suggest all parameters should be human verified.
Test the new and improved
get_cluster()
function.NOTE: You can compare the original & retrieved clusters by setting
write_to_file=True
onClusterConfiguration
andget_cluster()
cluster = get_cluster(cluster_name=<name>, namespace=<namespace>, is_appwrapper=False, write_to_file=True)
cluster.
methodscluster.down()
thencluster.up()
TODO
ImagePullSecrets
# DoneV1RayCluster
Checks